Modeling and Analysis of SAGE Libraries
نویسندگان
چکیده
A Serial Analysis of Gene Expression (SAGE) library is a collection of thousands of small DNA “tags”, each of which represents a distinct mRNA transcript. Existing methods have been proposed for analyzing single library data (i.e., one library per group) or one tag at a time. The practice of lumping all libraries together (in a multi-library setting) to form a “mega” library for each group is obviously unsatisfactory, but nonetheless performed frequently due to the lack of alternative methods. Since the tag counts within each library are inter-related as they are drawn from a multinomial distribution, analyzing thousands of tags one at a time is undoubtedly inadequate. Not only does such a practice ignore the dependency, but it also faces with the multiple testing adjustment issue. This article is an attempt to address both of these issues so that all tags from multi-library groups can be analyzed together. Focusing on the problem of identifying genes that are differentially expressed, a Bayesian formulation is established. Under this formulation, the problem of separating the differentially expressed genes from the majority of similarly expressed ones is treated as a model selection problem, and the reversible jump Markov chain Monte Carlo method is adapted for this purpose. The method is applied to a set of mouse libraries to uncover genes that are associated with the process of aging in the cerebellum. Our Gene Ontology (GO) analysis of the genes selected classifies them into several GO categories, which appear to be functionally relevant to aging.
منابع مشابه
Statistical modeling of sequencing errors in SAGE libraries.
MOTIVATION Sequencing errors may bias the gene expression measurements made by Serial Analysis of Gene Expression (SAGE). They may introduce non-existent tags at low abundance and decrease the real abundance of other tags. These effects are increased in the longer tags generated in LongSAGE libraries. Current sequencing technology generates quite accurate estimates of sequencing error rates. He...
متن کاملThe Mouse SAGE Site: database of public mouse SAGE libraries
The Mouse SAGE Site is a web-based database of all available public libraries generated by the Serial Analysis of Gene Expression (SAGE) from various mouse tissues and cell lines. The database contains mouse SAGE libraries organized in a uniform way and provides web-based tools for browsing, comparing and searching SAGE data with reliable tag-to-gene identification. A modified approach based on...
متن کاملIdentification and prevention of a GC content bias in SAGE libraries.
Serial Analysis of Gene Expression (SAGE) is becoming a widely used gene expression profiling method for the study of development, cancer and other human diseases. Investigators using SAGE rely heavily on the quantitative aspect of this method for cataloging gene expression and comparing multiple SAGE libraries. We have developed additional computational and statistical tools to assess the qual...
متن کاملExtract‐SAGE: An integrated platform for crossanalysis and GAbased selection of SAGE data
UNLABELLED Serial analysis of gene expression (SAGE) is a powerful quantification technique for gene expression data. The huge amount of tag data in SAGE libraries of samples is difficult to analyze with current SAGE analysis tools. Data is often not provided in a biologically significant way for cross-analysis and -comparison, thus limiting its application. Hence, an integrated software platfo...
متن کاملA comparative molecular analysis of developing mouse forelimbs and hindlimbs using serial analysis of gene expression (SAGE).
The analysis of differentially expressed genes is a powerful approach to elucidate the genetic mechanisms underlying the morphological and evolutionary diversity among serially homologous structures, both within the same organism (e.g., hand vs. foot) and between different species (e.g., hand vs. wing). In the developing embryo, limb-specific expression of Pitx1, Tbx4, and Tbx5 regulates the de...
متن کاملModified serial analysis of gene expression method for construction of gene expression profiles of microbial eukaryotic species.
Serial analysis of gene expression (SAGE) is a powerful approach for the identification of differentially expressed genes, providing comprehensive and quantitative gene expression profiles in the form of short tag sequences. Each tag represents a unique transcript, and the relative frequencies of tags in the SAGE library are equal to the relative proportions of the transcripts they represent. O...
متن کامل